{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "125c55c0-4d76-4d2b-960e-7a62cfe1b05a",
   "metadata": {},
   "source": [
    "# Access to data\n",
    "\n",
    "NablaDFT includes three types of databases:\n",
    "\n",
    "1. **Energy database.** There are molecule structure, energy, and forces. Data are available via the atomic simulation environment (ASE) interface.\n",
    "2. **Hamiltonian database.** There are molecule structure, energy, forces, hamiltonian and overlap matrix. Data are available via nabla2DFT custom access interface.\n",
    "3. **Raw psi4 wave function.** There are serialized Psi4 wavefunction. Data are available via psi4 or numpy interfaces.\n",
    "\n",
    "Each database has specific atom units, order of records, and order of atomic orbitals in Hamiltonians. In this tutorial, we show how to load and visualize some data. Advanced processing of metadata and Hamiltonians are described in the following lessons."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f150db6f-1f27-4682-b969-a189dfc9c00b",
   "metadata": {},
   "source": [
    "The smallest split (`train-tiny`,  51 Mb) of the energy database is available with `wget`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "58e3f443-a553-4198-a5af-2406fec58aa1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!wget https://a002dlils-kadurin-nabladft.obs.ru-moscow-1.hc.sbercloud.ru/data/nablaDFTv2/energy_databases/train_2k_v2_formation_energy_w_forces.db"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1299dd89-eccf-46b5-908c-b61c4963b42e",
   "metadata": {},
   "source": [
    "An atomic simulation environment (ASE) package is necessary for processing energy databases. It also helps to visualize molecules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b480883-e15b-4d70-b5fc-8f810dfd26f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import ase\n",
    "from ase.db import connect\n",
    "from ase.units import Bohr\n",
    "from ase.visualize import view"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2c1defd3-5406-437d-ac7c-a5e5b0010318",
   "metadata": {},
   "outputs": [],
   "source": [
    "with connect(\"train_2k_v2_formation_energy_w_forces.db\") as train_db:\n",
    "    atom_row = train_db.get(1)\n",
    "    row = atom_row.toatoms()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb819cf0-32c4-477b-b8b7-2743c14f2f4a",
   "metadata": {},
   "source": [
    "Energy databases store atom positions in Angstrom."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "364b62c9-fe9f-406e-8507-c16c546129ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "row.numbers, row.positions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de2a570b-f53a-420e-80f4-70592ccff27b",
   "metadata": {},
   "source": [
    "You can check the data using visualization. You can check the data using visualization. ASE takes physical units from meta information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3a63c949-995a-4fc1-8288-2ed1aae4aaf3",
   "metadata": {},
   "outputs": [],
   "source": [
    "view(row, viewer=\"x3d\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62225c37-0097-4082-a954-cb15a879026d",
   "metadata": {},
   "source": [
    "# Hamiltonian database"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6a66855-a4ff-445b-8fd6-7ab4d814158b",
   "metadata": {},
   "source": [
    "The smallest split of the Hamiltonian database (`train-tiny`) requires downloading of 14 Gb."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e217fa14-8271-4cdf-bebf-78bd6275bc22",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://a002dlils-kadurin-nabladft.obs.ru-moscow-1.hc.sbercloud.ru/data/nablaDFTv2/hamiltonian_databases/train_2k.db"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b8ab3bb-3625-4d3a-bfb8-d2802874a5ea",
   "metadata": {},
   "source": [
    "We provide a custom class to access the Hamiltonian databases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "e18af462-df20-48b1-8483-57d798233745",
   "metadata": {},
   "outputs": [],
   "source": [
    "from nablaDFT.dataset import HamiltonianDatabase"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "27fc3254-c06d-42f9-879a-2ac8c891efd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "help(HamiltonianDatabase)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "e12e8e66-7a39-44b6-9b5a-a3b0f88558c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "train = HamiltonianDatabase(\"train_2k.db\")\n",
    "# atoms numbers, atoms positions, energy, forces, core hamiltonian, overlap matrix, coefficients matrix\n",
    "Z, R, E, F, H, S, C, moses_id, conformation_id = train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "639eb14b-1720-48d6-8afa-0af954acaf77",
   "metadata": {},
   "outputs": [],
   "source": [
    "Z, R"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "668f714b-b7d6-4290-aef9-7d9f51a4f0d8",
   "metadata": {},
   "source": [
    "ASE cannot take physical units from metainformation. Angstrom to Bohr should be converted explicitly. You can check the correctness of units using ASE atomic visualization. If atomic units are correct, then each atom will drawn as a sphere. The spheres should touch each other but not intersect or stay aside."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3a381c35-32b5-4f1c-b2d2-3f6dc3aa41ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "atom = ase.Atoms(Z, R * Bohr)\n",
    "view(atom, viewer=\"x3d\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b565edd-1054-4322-8f5b-f53c4bf81806",
   "metadata": {},
   "source": [
    "# Raw psi4 wave function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5eb7f7fc-ed8d-4604-874a-bc0d3eb8ed40",
   "metadata": {},
   "source": [
    "You can upload PSI4 wavefunctions into the PSI4, or into numpy. Numpy-way is simpler because it is not required to install Psi4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a07fdf6e-7915-4c86-922c-7cf901c8e5d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://a002dlils-kadurin-nabladft.obs.ru-moscow-1.hc.sbercloud.ru/data/moses_wfns_big/wfns_moses_conformers_archive_0.tar\n",
    "!tar -xf wfns_moses_conformers_archive_0.tar\n",
    "!cd mnt/sdd/data/moses_wfns_big/\n",
    "!ls mnt/sdd/data/moses_wfns_big/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "e4616bb4-8265-436a-9478-a43c951f655d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "03ab2e26-b106-42df-82bf-449457fef623",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.load(\"mnt/sdd/data/moses_wfns_big/wfn_conf_50000_0.npy\", allow_pickle=True).tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "24119e0b-b786-4cfe-8cde-558ba2238c38",
   "metadata": {},
   "outputs": [],
   "source": [
    "Z = data[\"molecule\"][\"elez\"]\n",
    "R = data[\"molecule\"][\"geom\"].reshape((-1, 3))\n",
    "Z, R"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "10c6ef49-756d-437f-a8c6-8eb848021cfa",
   "metadata": {},
   "outputs": [],
   "source": [
    "atom = ase.Atoms(Z, R)\n",
    "view(atom, viewer=\"x3d\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "d8aec537-ef15-4f15-903b-7ec2cef076e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "wfn = np.load(\"mnt/sdd/data/moses_wfns_big/wfn_conf_50000_0.npy\", allow_pickle=True).tolist()\n",
    "orbital_matrix_a = wfn[\"matrix\"][\"Ca\"]  # alpha orbital coefficients\n",
    "orbital_matrix_b = wfn[\"matrix\"][\"Cb\"]  # betta orbital coefficients\n",
    "density_matrix_a = wfn[\"matrix\"][\"Da\"]  # alpha electonic density\n",
    "density_matrix_b = wfn[\"matrix\"][\"Db\"]  # betta electonic density\n",
    "aotoso_matrix = wfn[\"matrix\"][\"aotoso\"]  # atomic orbital to symmetry orbital transformation matrix\n",
    "core_hamiltonian_matrix = wfn[\"matrix\"][\"H\"]  # core Hamiltonian matrix\n",
    "fock_matrix_a = wfn[\"matrix\"][\"Fa\"]  # DFT alpha Fock matrix\n",
    "fock_matrix_b = wfn[\"matrix\"][\"Fb\"]  # DFT betta Fock matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d69157a8-311e-40e1-ba86-918de5e25645",
   "metadata": {},
   "source": [
    "An advenced processing of wavefunctions and data is available from psi4 also. Note, that psi4 require compilation or conda install. More information about obtaining of PSI4 is available here https://psicode.org/psi4manual/master/build_obtaining.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "458fc854-655d-4a1f-9f11-9071e76a5f2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import psi4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13afde2b-e584-4013-8af5-2be0ee1d5c41",
   "metadata": {},
   "outputs": [],
   "source": [
    "wfn = psi4.core.Wavefunction.from_file(\"mnt/sdd/data/moses_wfns_big/wfn_conf_50000_0.npy\")\n",
    "psi4.oeprop(wfn, \"MAYER_INDICES\")\n",
    "psi4.oeprop(wfn, \"WIBERG_LOWDIN_INDICES\")\n",
    "psi4.oeprop(wfn, \"MULLIKEN_CHARGES\")\n",
    "psi4.oeprop(wfn, \"LOWDIN_CHARGES\")\n",
    "meyer_bos = wfn.array_variables()[\"MAYER INDICES\"]  # Mayer bond indices\n",
    "lodwin_bos = wfn.array_variables()[\"WIBERG LOWDIN INDICES\"]  # Wiberg bond indices\n",
    "mulliken_charges = wfn.array_variables()[\"MULLIKEN CHARGES\"]  # Mulliken atomic charges\n",
    "lowdin_charges = wfn.array_variables()[\"LOWDIN CHARGES\"]  # Löwdin atomic charges"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "docs2",
   "language": "python",
   "name": "docs2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
